Aggression and Complexity in Trump’s 2020 Rhetoric
An Analysis of 2020 Presidential Campaign Speeches
Kayla Muller
2025-05-06
Table of Contents
Introduction
Data
Aggression Time Trend
Simplicity Time Trend
Topic Modeling: Aggression
Topic Modeling: Simplicity
Conclusion
Introduction
Research Question:
Is there a correlation between aggressivity and rhetorical complexity in Donald Trump’s 2020 presidential campaign speeches?
Data
Chalkiadakis, Ioannis and Anglès d’Auriac, Louise and Peters, Gareth and Frau-Meigs, Divina, A text dataset of campaign speeches of the main tickets in the 2020 US presidential election (September 20, 2024)
This analysis uses Trump’s campaign speeches from the 2020 presidential election to assess if, or to what extent, there is a correlation between aggressivity and rhetorical simplicity.
The dataset consists of 235 official transcripts of Donald Trump’s speeches throughout his 2020 presidential campaign from January, 2019 through January, 2021.
Monthly Average Aggression Ratio
Visualizing the trend next to a two month rolling average.
Monthly Average Aggression Ratio
Visualizing the trend next to a two month rolling average.
Aggression in the 75th Percentile
Visualizing a subset consisting of 21 of the most aggressive speeches, with a ratio above 0.206258 in the 75th percentile.
Rhetoric Complexity
Monthly Average Flesch Score
Simplicity in the 75th Percentile
Speeches with a flesch_scoreabove 68.72: Trump’s simplest speeches. The subset for the 75th percentile consists of 59 speeches, of 235 total.
Topic Modeling: Aggression in the 75th Percentile
Linear Discriminant Analysis (LDA) to identify the top topics in the 75th percentile of aggressive speeches.
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Topic #1: know said peopl want dont say thing great think right
Topic #2: woman nation iran futur countri american terror state continu busi
Topic #3: race state unit sex order nation american feder act agenc
Topic #4: countri american border year biden peopl nation america presid want
Topic #5: iran world unit nation state american china peopl year hong
Topic #6: thank american america nation great peopl state unit child histori
Topic #7: divis holocaust appoint th unit woman act secretari crime day
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Topic #1: peopl think good lot thing great number want countri meet
Topic #2: thank peopl great know said want think countri like say
Topic #3: know said peopl dont want year laughter say right great
Topic #4: crowd number great happen mani come know im weve thank
Topic #5: peac heart brother robert wonder memori tonight live best forev
Topic #6: presid trump said know want dont peopl year say biden
WordCloud Top Topics: Simplicity
Conclusion
Summary
Moderate Inverse Relationship
Aggression Topics: Justice and order, fake news, China, and immigration
Topics linked to linguistic simplicity: His brother’s passing, patriotism, and policy
Future Research
Larger Dataset
I recommend using a more diverse selection of documents (tweets, statements made on social media, and transcriptions of video clips)
Trump has been known to make inflammatory remarks about political opponents on social media, and this would be a more precise avenue to pursue a deeper analysis.
import pandas as pdimport json# Path to your filefile_path ='/Users/KaylaMuller/desktop/text_analysis/week12/cleantext_DonaldTrump.jsonl.txt'# Read the file line by line and parse each line as JSONdata = []withopen(file_path, 'r', encoding='utf-8') as f:for line in f: data.append(json.loads(line))# Turn into a DataFrameTrumpdf = pd.DataFrame(data)
import pandas as pdimport re# Make sure your list of words is definedword_list =set(american_words) # Compile a regex pattern that matches any of the words, word-boundary safepattern = re.compile(r'\b('+'|'.join(re.escape(word) for word in word_list) +r')\b', re.IGNORECASE)# Apply a function to count matches in each rowTrumpdf["NegativeWordCount"] = Trumpdf["CleanText"].astype(str).apply(lambda text: len(pattern.findall(text)))Trumpdf["TotalWordCount"] = Trumpdf["CleanText"].astype(str).apply(lambda text: len(re.findall(r'\b\w+\b', text)))Trumpdf["neg_ratio"] = Trumpdf["NegativeWordCount"] / Trumpdf["TotalWordCount"] *100# Ensure the 'Date' column is in datetime formatTrumpdf["Date"] = pd.to_datetime(Trumpdf["Date"], errors="coerce")# Drop rows where 'Date' is NaT (invalid dates)Trumpdf = Trumpdf.dropna(subset=["Date"])# Extract YearMonth in string format (YYYY-MM) for easier handling in ggplotTrumpdf["YearMonth"] = Trumpdf["Date"].dt.to_period('M').astype(str)# Calculate the average 'neg_ratio' by 'YearMonth'monthly_avg_neg_ratio = Trumpdf.groupby("YearMonth")["neg_ratio"].mean().reset_index()# Export the result to CSV for use in Rmonthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio.csv", index=False)
library(reticulate)library(ggplot2)# Load the CSV file (make sure you have the correct path to the file)df <-read.csv("monthly_avg_neg_ratio.csv")# Convert 'YearMonth' to a date formatdf$YearMonth <-as.Date(paste0(df$YearMonth, "-01"))# Plot the dataggplot(df, aes(x = YearMonth, y = neg_ratio)) +geom_line() +labs(title ="Monthly Average Aggression Ratio", x ="Month", y ="Aggression Ratio (%)") +theme_minimal()
# Sort by 'YearMonth' to ensure the rolling average works correctlymonthly_avg_neg_ratio = monthly_avg_neg_ratio.sort_values("YearMonth")# Calculate the two-month rolling average of 'neg_ratio'monthly_avg_neg_ratio["TwoMonthRollingAvg"] = monthly_avg_neg_ratio["neg_ratio"].rolling(window=2).mean()# Export the result to CSV for use in Rmonthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio_with_rolling_avg.csv", index=False)
library(ggplot2)library(readr)library(dplyr)# Read the datamonthly_avg_neg_ratio <-read_csv("monthly_avg_neg_ratio_with_rolling_avg.csv")# Convert YearMonth to Date typemonthly_avg_neg_ratio <- monthly_avg_neg_ratio %>%mutate(Date =as.Date(paste0(YearMonth, "-01")))# Plot with ggplotggplot(monthly_avg_neg_ratio, aes(x = Date)) +geom_line(aes(y = neg_ratio), color ="blue", linetype ="dashed", size =1) +geom_line(aes(y = TwoMonthRollingAvg), color ="red", size =1) +labs(title ="Monthly Negative Ratio with Two-Month Rolling Average",x ="Date",y ="Negative Ratio (%)") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) +scale_x_date(date_labels ="%Y-%m", date_breaks ="1 month")
Analysis of Aggression in the 75th Percentile
# Subset the DataFrame to select only rows where 'neg_ratio' > 0.206258subset_df = Trumpdf[Trumpdf["neg_ratio"] >0.206258]# Calculate the average 'neg_ratio' by 'YearMonth'subset_monthly_avg_neg_ratio = subset_df.groupby("YearMonth")["neg_ratio"].mean().reset_index()# Export the result to CSV for use in Rsubset_monthly_avg_neg_ratio.to_csv("monthly_avg_neg_ratio.csv", index=False)
library(reticulate)library(ggplot2)# Load the CSV file (make sure you have the correct path to the file)df_with_subset <-read.csv("monthly_avg_neg_ratio.csv")# Convert 'YearMonth' to a date formatdf_with_subset$YearMonth <-as.Date(paste0(df_with_subset$YearMonth, "-01"))# Plot the dataggplot(df_with_subset, aes(x = YearMonth, y = neg_ratio)) +geom_line() +labs(title ="Monthly Average Aggression Ratio for the 75th percentile", x ="Month", y ="Aggression Ratio (%)") +theme_minimal()
Monthly Average Flesch Score
from textstat import flesch_reading_easeTrumpdf['flesch_score'] = Trumpdf['CleanText'].apply(flesch_reading_ease)# Calculate the average 'flesch_score' by 'YearMonth'monthly_avg_flesch_score = Trumpdf.groupby("YearMonth")["flesch_score"].mean()# Export the result to CSV for use in Rmonthly_avg_flesch_score.to_csv("monthly_avg_flesch_score.csv", index=True)
library(ggplot2)library(readr)library(dplyr)# Read the datamonthly_avg_flesch_score <-read_csv("monthly_avg_flesch_score.csv")# Convert YearMonth to Date typemonthly_avg_flesch_score <- monthly_avg_flesch_score %>%mutate(Date =as.Date(paste0(YearMonth, "-01")))# Plot with ggplotggplot(monthly_avg_flesch_score, aes(x = Date)) +geom_line(aes(y = flesch_score), color ="blue", linetype ="dashed", size =1) +geom_line(aes(y = flesch_score), color ="red", size =1) +labs(title ="Monthly Average Flesch Score",x ="Date",y ="Flesch Score") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) +scale_x_date(date_labels ="%Y-%m", date_breaks ="1 month")
Analysis of Flesch Score Above the 75th Percentile
# Subset the DataFrame to select only rows where 'flesch_score' > 68.72subset_df_flesch_score = Trumpdf[Trumpdf["flesch_score"] >68.72]# Calculate the average 'neg_ratio' by 'YearMonth'subset_monthly_avg_flesch_score = subset_df_flesch_score.groupby("YearMonth")["flesch_score"].mean().reset_index()# Export the result to CSV for use in Rsubset_monthly_avg_flesch_score.to_csv("subset_monthly_avg_flesch_score.csv", index=True)
library(ggplot2)library(readr)library(dplyr)# Read the datasubset_monthly_avg_flesch_score <-read_csv("/Users/KaylaMuller/Desktop/text_analysis/week12/subset_monthly_avg_flesch_score.csv")# Convert YearMonth to Date typesubset_monthly_avg_flesch_score <- subset_monthly_avg_flesch_score %>%mutate(Date =as.Date(paste0(YearMonth, "-01")))# Plot with ggplotggplot(subset_monthly_avg_flesch_score, aes(x = Date)) +geom_line(aes(y = flesch_score), color ="blue", linetype ="dashed", size =1) +geom_line(aes(y = flesch_score), color ="red", size =1) +labs(title ="Monthly Average Flesch Score for the 75th Percentile",x ="Date",y ="Flesch Score") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) +scale_x_date(date_labels ="%Y-%m", date_breaks ="1 month")
Topic Modeling: Aggression in the 75th Percentile
import stringimport reimport nltkfrom nltk.tokenize import word_tokenizefrom nltk.corpus import stopwordsfrom nltk.stem import WordNetLemmatizer, PorterStemmer# Step 0: Optional — Make a copy to avoid SettingWithCopyWarningsubset_df = subset_df.copy()# Setupstop =set(stopwords.words('english'))stop.add('applause') # custom stopwordlemmatizer = WordNetLemmatizer()stemmer = PorterStemmer()# Combined cleaning functiondef clean_text(text): text = text.lower() # lowercase text = text.translate(str.maketrans('', '', string.punctuation)) # remove punctuation text = re.sub(r'\d+', '', text) # remove numbers tokens = word_tokenize(text) # tokenize tokens = [word for word in tokens if word notin stop] # remove stopwords tokens = [lemmatizer.lemmatize(word) for word in tokens] # lemmatization tokens = [stemmer.stem(word) for word in tokens] # stemmingreturn' '.join(tokens)# Apply to DataFramesubset_df['CleanText_transformed'] = subset_df['CleanText'].apply(clean_text)
from wordcloud import WordCloudimport matplotlib.pyplot as plt# Get the feature names (words)feature_names = vectorizer.get_feature_names_out()# Loop over each topicfor topic_idx, topic_weights inenumerate(lda.components_):# Create dictionary: word -> weight word_freq = {feature_names[i]: topic_weights[i] for i in topic_weights.argsort()[:-31:-1]} # top 30 words# Generate the word cloud wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(word_freq)# Plot the word cloud plt.figure(figsize=(10, 5)) plt.imshow(wordcloud, interpolation='bilinear') plt.axis("off") plt.title(f"Topic #{topic_idx +1}") plt.show()
Topic Modeling: Simplicity in the 75th Percentile
import stringimport reimport nltkfrom nltk.tokenize import word_tokenizefrom nltk.corpus import stopwordsfrom nltk.stem import WordNetLemmatizer, PorterStemmer# Step 0: Optional — Make a copy to avoid SettingWithCopyWarningsubset_df_flesch_score = subset_df_flesch_score.copy()# Setupstop =set(stopwords.words('english'))stop.add('applause') # custom stopwordlemmatizer = WordNetLemmatizer()stemmer = PorterStemmer()# Combined cleaning functiondef clean_text(text): text = text.lower() # lowercase text = text.translate(str.maketrans('', '', string.punctuation)) # remove punctuation text = re.sub(r'\d+', '', text) # remove numbers tokens = word_tokenize(text) # tokenize tokens = [word for word in tokens if word notin stop] # remove stopwords tokens = [lemmatizer.lemmatize(word) for word in tokens] # lemmatization tokens = [stemmer.stem(word) for word in tokens] # stemmingreturn' '.join(tokens)# Apply to DataFramesubset_df_flesch_score['CleanText_transformed'] = subset_df_flesch_score['CleanText'].apply(clean_text)
from wordcloud import WordCloudimport matplotlib.pyplot as plt# Get the feature names (words)feature_names = vectorizer.get_feature_names_out()# Loop over each topicfor topic_idx, topic_weights inenumerate(lda2.components_):# Create dictionary: word -> weight word_freq = {feature_names[i]: topic_weights[i] for i in topic_weights.argsort()[:-31:-1]} # top 30 words# Generate the word cloud wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(word_freq)# Plot the word cloud plt.figure(figsize=(10, 5)) plt.imshow(wordcloud, interpolation='bilinear') plt.axis("off") plt.title(f"Topic #{topic_idx +1}") plt.show()